133 research outputs found

    Real-time selective sequencing using nanopore technology

    Get PDF
    The Oxford Nanopore Technologies MinION sequencer enables the selection of specific DNA molecules for sequencing by reversing the driving voltage across individual nanopores. To directly select molecules for sequencing, we used dynamic time warping to match reads to reference sequences. We demonstrate our open-source Read Until software in real-time selective sequencing of regions within small genomes, individual amplicon enrichment and normalization of an amplicon set

    Evolutionary history of endogenous Human Herpesvirus 6 reflects human migration out of Africa

    Get PDF
    Human herpesvirus 6A and 6B (HHV-6) can integrate into the germline, and as a result, ∼70 million people harbor the genome of one of these viruses in every cell of their body. Until now, it has been largely unknown if 1) these integrations are ancient, 2) if they still occur, and 3) whether circulating virus strains differ from integrated ones. Here, we used next-generation sequencing and mining of public human genome data sets to generate the largest and most diverse collection of circulating and integrated HHV-6 genomes studied to date. In genomes of geographically dispersed, only distantly related people, we identified clades of integrated viruses that originated from a single ancestral event, confirming this with fluorescent in situ hybridization to directly observe the integration locus. In contrast to HHV-6B, circulating and integrated HHV-6A sequences form distinct clades, arguing against ongoing integration of circulating HHV-6A or “reactivation” of integrated HHV-6A. Taken together, our study provides the first comprehensive picture of the evolution of HHV-6, and reveals that integration of heritable HHV-6 has occurred since the time of, if not before, human migrations out of Africa

    Streaming histogram sketching for rapid microbiome analytics

    Get PDF
    Background: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. Results: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. Conclusions: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space

    The current landscape of nucleic acid tests for filovirus detection.

    Get PDF
    Nucleic acid testing (NAT) for pathogenic filoviruses plays a key role in surveillance and to control the spread of infection. As they share clinical features with other pathogens, the initial spread of these viruses can be misdiagnosed. Tests that can identify a pathogen in the initial stages of infection are essential to control outbreaks. Since the Ebola virus disease (EVD) outbreak in 2014-2016 several tests have been developed that are faster than previous tests and more suited for field use. Furthermore, the ability to test for a range of pathogens simultaneously has been expanded to improve clinical pathway management of febrile syndromes. This review provides an overview of these novel diagnostic tests

    A Comparative Study of Human TLR 7/8 Stimulatory Trimer Compositions in Influenza A Viral Genomes

    Get PDF
    Background: Variation in the genomes of single-stranded RNA viruses affects their infectivity and pathogenicity in two ways. First, viral genome sequence variations lead to changes in viral protein sequences and activities. Second, viral genome sequence variation produces diversity at the level of nucleotide composition and diversity in the interactions between viral RNAs and host toll-like receptors (TLRs). A viral genome-typing method based on this type of diversity has not yet been established. Methodology/Principal Findings: In this study, we propose a novel genomic trait called the ‘‘TLR stimulatory trimer composition’ ’ (TSTC) and two quantitative indicators, Score S and Score N, named ‘‘TLR stimulatory scores’ ’ (TSS). Using the complete genome sequences of 10,994 influenza A viruses (IAV) and 251 influenza B viruses, we show that TSTC analysis reveals the diversity of Score S and Score N among the IAVs isolated from various hosts. In addition, we show that low values of Score S are correlated with high pathogenicity and pandemic potential in IAVs. Finally, we use Score S and Score N to construct a logistic regression model to recognize IAV strains that are highly pathogenic or have high pandemic potential. Conclusions/Significance: Results from the TSTC analysis indicate that there are large differences between human and avian IAV genomes (except for segment 3), as illustrated by Score S. Moreover, segments 1, 2, 3 and 4 may be majo

    Genomic Characterization and High Prevalence of Bocaviruses in Swine

    Get PDF
    Using random PCR amplification followed by plasmid subcloning and DNA sequencing, we detected bocavirus related sequences in 9 out of 17 porcine stool samples. Using primer walking, we sequenced the nearly complete genomes of two highly divergent bocaviruses we provisionally named porcine bocavirus 1 isolate H18 (PBoV1-H18) and porcine bocavirus 2 isolate A6 (PBoV2-A6) which differed by 51.8% in their NS1 protein. Phylogenetic analysis indicated that PBoV1-H18 was very closely related to a ∼2 Kb central region of a porcine bocavirus-like virus (PBo-LikeV) from Sweden described in 2009. PBoV2-A6 was very closely related to the porcine bocavirus genomes PBoV-1 and PBoV2 from China described in 2010. Among 340 fecal samples collected from different age, asymptomatic swine in five Chinese provinces, the prevalence of PBoV1-H18 and PBoV2-A6 related viruses were 45–75% and 55–70% respectively, with 30–47% of pigs co-infected. PBoV1-A6 related strains were highly conserved, while PBoV2-H18 related strains were more diverse, grouping into two genotypes corresponding to the previously described PBoV1 and PBoV2. Together with the recently described partial bocavirus genomes labeled V6 and V7, a total of three major porcine bocavirus clades have therefore been described to date. Further studies will be required to elucidate the possible pathogenic impact of these diverse bocaviruses either alone or in combination with other porcine viruses

    Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing

    Get PDF
    Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness
    corecore